7 research outputs found
ILU Smoothers for AMG with Scaled Triangular Factors
ILU smoothers are effective in the algebraic multigrid (AMG) V-cycle for
reducing high-frequency components of the residual error. However, direct
triangular solves are comparatively slow on GPUs. Previous work by Chow and
Patel (2015) and Antz et al. (2015) demonstrated the advantages of Jacobi
relaxation as an alternative. Depending on the threshold and fill-level
parameters chosen, the factors are highly non-normal and Jacobi is unlikely to
converge in a low number of iterations. The Ruiz algorithm applies row or
row/column scaling to U in order to reduce the departure from normality. The
inherently sequential solve is replaced with a Richardson iteration. There are
several advantages beyond the lower compute time. Scaling is performed locally
for a diagonal block of the global matrix because it is applied directly to the
factor. An ILUT Schur complement smoother maintains a constant GMRES iteration
count as the number of MPI ranks increases and thus parallel strong-scaling is
improved. The new algorithms are included in hypre, and achieve improved time
to solution for several Exascale applications, including the Nalu-Wind and
PeleLM pressure solvers. For large problem sizes, GMRES+AMG with iterative
triangular solves execute at least five times faster than with direct on
massively-parallel GPUs.Comment: v2 updated citation information; v3 updated results; v4 abstract
updated, new results added; v5 new experimental analysis and results adde
GPU-resident sparse direct linear solvers for alternating current optimal power flow analysis
Integrating renewable resources within the transmission grid at a wide scale poses significant challenges for economic dispatch as it requires analysis with more optimization parameters, constraints, and sources of uncertainty. This motivates the investigation of more efficient computational methods, especially those for solving the underlying linear systems, which typically take more than half of the overall computation time. In this paper, we present our work on sparse linear solvers that take advantage of hardware accelerators, such as graphical processing units (GPUs), and improve the overall performance when used within economic dispatch computations. We treat the problems as sparse, which allows for faster execution but also makes the implementation of numerical methods more challenging. We present the first GPU-native sparse direct solver that can execute on both AMD and NVIDIA GPUs. We demonstrate significant performance improvements when using high-performance linear solvers within alternating current optimal power flow (ACOPF) analysis. Furthermore, we demonstrate the feasibility of getting significant performance improvements by executing the entire computation on GPU-based hardware. Finally, we identify outstanding research issues and opportunities for even better utilization of heterogeneous systems, including those equipped with GPUs
GPU-Resident Sparse Direct Linear Solvers for Alternating Current Optimal Power Flow Analysis
Integrating renewable resources within the transmission grid at a wide scale
poses significant challenges for economic dispatch as it requires analysis with
more optimization parameters, constraints, and sources of uncertainty. This
motivates the investigation of more efficient computational methods, especially
those for solving the underlying linear systems, which typically take more than
half of the overall computation time. In this paper, we present our work on
sparse linear solvers that take advantage of hardware accelerators, such as
graphical processing units (GPUs), and improve the overall performance when
used within economic dispatch computations. We treat the problems as sparse,
which allows for faster execution but also makes the implementation of
numerical methods more challenging. We present the first GPU-native sparse
direct solver that can execute on both AMD and NVIDIA GPUs. We demonstrate
significant performance improvements when using high-performance linear solvers
within alternating current optimal power flow (ACOPF) analysis. Furthermore, we
demonstrate the feasibility of getting significant performance improvements by
executing the entire computation on GPU-based hardware. Finally, we identify
outstanding research issues and opportunities for even better utilization of
heterogeneous systems, including those equipped with GPUs
Scalability of high-performance PDE solvers
Performance tests and analyses are critical to effective HPC software
development and are central components in the design and implementation of
computational algorithms for achieving faster simulations on existing and
future computing architectures for large-scale application problems. In this
paper, we explore performance and space-time trade-offs for important
compute-intensive kernels of large-scale numerical solvers for PDEs that govern
a wide range of physical applications. We consider a sequence of PDE- motivated
bake-off problems designed to establish best practices for efficient high-order
simulations across a variety of codes and platforms. We measure peak
performance (degrees of freedom per second) on a fixed number of nodes and
identify effective code optimization strategies for each architecture. In
addition to peak performance, we identify the minimum time to solution at 80%
parallel efficiency. The performance analysis is based on spectral and p-type
finite elements but is equally applicable to a broad spectrum of numerical PDE
discretizations, including finite difference, finite volume, and h-type finite
elements.Comment: 25 pages, 54 figure